Making Backyards Affordable for All: A YIMBY Analysis

Author

Your Name

Published

October 19, 2025

Introduction

Housing affordability remains one of the most pressing challenges facing American cities today. This analysis examines metropolitan areas across the United States to identify “YIMBY” (Yes In My Backyard) success stories—cities that have effectively addressed housing affordability through permissive building policies. Using data from the US Census Bureau and Bureau of Labor Statistics, we develop metrics to measure rent burden and housing growth, ultimately identifying which cities have made meaningful progress in making housing more affordable.

Data Acquisition

Task 1: Data Import

Show code
# Load required packages
library(tidyverse)
library(DT)
library(scales)
library(tidycensus)
library(glue)
library(readxl)
library(httr2)
library(rvest)
library(gghighlight)

# Set up data directory
if(!dir.exists(file.path("data", "mp02"))){
    dir.create(file.path("data", "mp02"), showWarnings=FALSE, recursive=TRUE)
}

# Helper function for package installation
ensure_package <- function(pkg){
    pkg <- as.character(substitute(pkg))
    options(repos = c(CRAN = "https://cloud.r-project.org"))
    if(!require(pkg, character.only=TRUE, quietly=TRUE)) install.packages(pkg)
    stopifnot(require(pkg, character.only=TRUE, quietly=TRUE))
}

# Census API key setup (first time only)
# tidycensus::census_api_key("your_key_here", install = TRUE)

# Helper function to get ACS data across multiple years
get_acs_all_years <- function(variable, geography="cbsa",
                              start_year=2009, end_year=2023){
    fname <- glue("{variable}_{geography}_{start_year}_{end_year}.csv")
    fname <- file.path("data", "mp02", fname)
    
    if(!file.exists(fname)){
        YEARS <- seq(start_year, end_year)
        YEARS <- YEARS[YEARS != 2020] # Drop 2020 - No survey (covid)
        
        ALL_DATA <- map(YEARS, function(yy){
            tidycensus::get_acs(geography, variable, year=yy, survey="acs1") |>
                mutate(year=yy) |>
                select(-moe, -variable) |>
                rename(!!variable := estimate)
        }) |> bind_rows()
        
        write_csv(ALL_DATA, fname)
    }
    
    read_csv(fname, show_col_types=FALSE)
}

# Load Census ACS data
INCOME <- get_acs_all_years("B19013_001") |>
    rename(household_income = B19013_001)

RENT <- get_acs_all_years("B25064_001") |>
    rename(monthly_rent = B25064_001)

POPULATION <- get_acs_all_years("B01003_001") |>
    rename(population = B01003_001)

HOUSEHOLDS <- get_acs_all_years("B11001_001") |>
    rename(households = B11001_001)
Show code
# Helper function to get building permits data
get_building_permits <- function(start_year = 2009, end_year = 2023){
    fname <- glue("housing_units_{start_year}_{end_year}.csv")
    fname <- file.path("data", "mp02", fname)
    
    if(!file.exists(fname)){
        # Historical data (2009-2018) from text files
        HISTORICAL_YEARS <- seq(start_year, 2018)
        
        HISTORICAL_DATA <- map(HISTORICAL_YEARS, function(yy){
            historical_url <- glue("https://www.census.gov/construction/bps/txt/tb3u{yy}.txt")
                
            LINES <- readLines(historical_url)[-c(1:11)]
            CBSA_LINES <- str_detect(LINES, "^[[:digit:]]")
            CBSA <- as.integer(str_sub(LINES[CBSA_LINES], 5, 10))
            PERMIT_LINES <- str_detect(str_sub(LINES, 48, 53), "[[:digit:]]")
            PERMITS <- as.integer(str_sub(LINES[PERMIT_LINES], 48, 53))
            
            data.frame(CBSA = CBSA,
                       new_housing_units_permitted = PERMITS, 
                       year = yy)
        }) |> bind_rows()
        
        # Current data (2019-2023) from Excel files
        CURRENT_YEARS <- seq(2019, end_year)
        
        CURRENT_DATA <- map(CURRENT_YEARS, function(yy){
            current_url <- glue("https://www.census.gov/construction/bps/xls/msaannual_{yy}99.xls")
            temp <- tempfile()
            download.file(current_url, destfile = temp)
            
            # Fallback function for different Excel formats
            fallback <- function(.f1, .f2){
                function(...){
                    tryCatch(.f1(...), error=function(e) .f2(...))
                }
            }
            
            reader <- fallback(readxl::read_xlsx, readxl::read_xls)
            
            reader(temp, skip=5) |>
                na.omit() |>
                select(CBSA, Total) |>
                mutate(year = yy) |>
                rename(new_housing_units_permitted = Total)
        }) |> bind_rows()
        
        ALL_DATA <- rbind(HISTORICAL_DATA, CURRENT_DATA)
        write_csv(ALL_DATA, fname)
    }
    
    read_csv(fname, show_col_types=FALSE)
}

PERMITS <- get_building_permits()
Show code
# Helper function to get BLS industry codes
get_bls_industry_codes <- function(){
    fname <- file.path("data", "mp02", "bls_industry_codes.csv")
    
    if(!file.exists(fname)){
        resp <- httr2::request("https://www.bls.gov") |> 
            httr2::req_url_path("cew", "classifications", "industry", "industry-titles.htm") |>
            httr2::req_headers(`User-Agent` = "Mozilla/5.0") |> 
            httr2::req_error(is_error = \(resp) FALSE) |>
            httr2::req_perform()
        
        httr2::resp_check_status(resp)
        
        naics_table <- httr2::resp_body_html(resp) |>
            rvest::html_element("#naics_titles") |> 
            rvest::html_table() |>
            mutate(title = str_trim(str_remove(str_remove(`Industry Title`, Code), "NAICS"))) |>
            select(-`Industry Title`) |>
            mutate(depth = if_else(nchar(Code) <= 5, nchar(Code) - 1, NA)) |>
            filter(!is.na(depth))
        
        naics_table <- naics_table |> 
            filter(depth == 4) |> 
            rename(level4_title=title) |> 
            mutate(level1_code = str_sub(Code, end=2), 
                   level2_code = str_sub(Code, end=3), 
                   level3_code = str_sub(Code, end=4)) |>
            left_join(naics_table, join_by(level1_code == Code)) |>
            rename(level1_title=title) |>
            left_join(naics_table, join_by(level2_code == Code)) |>
            rename(level2_title=title) |>
            left_join(naics_table, join_by(level3_code == Code)) |>
            rename(level3_title=title) |>
            select(-starts_with("depth")) |>
            rename(level4_code = Code) |>
            select(level1_title, level2_title, level3_title, level4_title, 
                   level1_code, level2_code, level3_code, level4_code)
    
        write_csv(naics_table, fname)
    }
    
    read_csv(fname, show_col_types=FALSE)
}

# Helper function to get BLS QCEW wage data
get_bls_qcew_annual_averages <- function(start_year=2009, end_year=2023){
    fname <- glue("bls_qcew_{start_year}_{end_year}.csv")
    fname <- file.path("data", "mp02", fname)
    
    if(!file.exists(fname)){
        YEARS <- seq(start_year, end_year)
        YEARS <- YEARS[YEARS != 2020] # Drop Covid year to match ACS
        
        ALL_DATA <- map(YEARS, .progress=TRUE, function(yy){
            fname_inner <- file.path("data", "mp02", glue("{yy}_qcew_annual_singlefile.zip"))
            
            if(!file.exists(fname_inner)){
                httr2::request("https://www.bls.gov") |> 
                httr2::req_url_path("cew", "data", "files", yy, "csv",
                             glue("{yy}_annual_singlefile.zip")) |>
                httr2::req_headers(`User-Agent` = "Mozilla/5.0") |> 
                httr2::req_error(is_error = \(resp) FALSE) |>
                httr2::req_perform(fname_inner)
            }
            
            read_csv(fname_inner, show_col_types=FALSE) |> 
                mutate(YEAR = yy) |>
                select(area_fips, industry_code, annual_avg_emplvl, 
                       total_annual_wages, YEAR) |>
                filter(nchar(industry_code) <= 5, str_starts(area_fips, "C")) |>
                rename(FIPS = area_fips, INDUSTRY = industry_code, 
                       EMPLOYMENT = annual_avg_emplvl, 
                       TOTAL_WAGES = total_annual_wages) |>
                mutate(INDUSTRY = as.integer(INDUSTRY),
                       EMPLOYMENT = as.integer(EMPLOYMENT)) |>
                filter(INDUSTRY != 10) |> 
                mutate(AVG_WAGE = TOTAL_WAGES / EMPLOYMENT)
        }) |> bind_rows()
        
        write_csv(ALL_DATA, fname)
    }
    
    read_csv(fname, show_col_types=FALSE)
}

INDUSTRY_CODES <- get_bls_industry_codes()
WAGES <- get_bls_qcew_annual_averages()

Data Structure Overview

We have successfully imported five main datasets:

  • INCOME: Household income by CBSA and year
  • RENT: Monthly rent by CBSA and year
  • POPULATION: Total population by CBSA and year
  • HOUSEHOLDS: Number of households by CBSA and year
  • PERMITS: New housing units permitted by CBSA and year
  • WAGES: Employment and wage data by CBSA, industry, and year
  • INDUSTRY_CODES: NAICS industry code lookup table

Key Join Keys:

  • Census data uses GEOID (as character) and NAME (CBSA name)
  • BLS data uses FIPS (CBSA code as “C12345”)
  • Permits data uses CBSA (as integer)
  • All datasets include year for temporal joins

Data Integration and Initial Exploration

Task 2: Multi-Table Questions

Question 1: Largest Housing Permit Volume (2010-2019)

Which CBSA permitted the most new housing units from 2010-2019?

Show code
q1_result <- PERMITS |>
  filter(year >= 2010, year <= 2019) |>
  group_by(CBSA) |>
  summarize(total_permits = sum(new_housing_units_permitted, na.rm = TRUE)) |>
  arrange(desc(total_permits)) |>
  slice(1) |>
  left_join(POPULATION |> select(GEOID, NAME) |> distinct(),
            by = c("CBSA" = "GEOID"))

q1_result |>
  select(NAME, total_permits) |>
  mutate(total_permits = comma(total_permits, accuracy = 1)) |>
  datatable(
    caption = "CBSA with Most Housing Permits (2010-2019)",
    options = list(pageLength = 5, dom = 't'),
    colnames = c("Metropolitan Area", "Total Permits"),
    rownames = FALSE
  )

Answer: The Houston-Sugar Land-Baytown, TX Metro Area, Houston-The Woodlands-Sugar Land, TX Metro Area, Houston-Pasadena-The Woodlands, TX Metro Area permitted the largest number of new housing units between 2010 and 2019, with a total of 482,075, 482,075, 482,075 units.

Question 2: Albuquerque Peak Permitting Year

In what year did Albuquerque permit the most new housing units?

Show code
q2_result <- PERMITS |>
  filter(CBSA == 10740) |>
  arrange(desc(new_housing_units_permitted)) |>
  slice(1)

albuquerque_permits <- PERMITS |>
  filter(CBSA == 10740) |>
  arrange(year) |>
  mutate(new_housing_units_permitted = comma(new_housing_units_permitted, accuracy = 1))

albuquerque_permits |>
  datatable(
    caption = "Albuquerque Housing Permits by Year",
    options = list(pageLength = 15),
    colnames = c("CBSA Code", "Housing Permits", "Year"),
    rownames = FALSE
  )

Answer: Albuquerque permitted the most new housing units in 2021, with 4,021 permits issued.

Note: The data shows a notable decline in 2021, likely related to COVID-19 impacts on construction activity.

Question 3: Highest Average Income State (2015)

Which state had the highest average household income in 2015?

Show code
state_df <- data.frame(
  abb  = c(state.abb, "DC", "PR"),
  name = c(state.name, "District of Columbia", "Puerto Rico")
)

q3_result <- INCOME |>
  filter(year == 2015) |>
  left_join(HOUSEHOLDS |> filter(year == 2015), 
            by = c("GEOID", "NAME", "year")) |>
  mutate(state = str_extract(NAME, ", (.{2})", group = 1),
         total_income = household_income * households) |>
  group_by(state) |>
  summarize(
    total_income = sum(total_income, na.rm = TRUE),
    total_households = sum(households, na.rm = TRUE),
    avg_income = total_income / total_households,
    .groups = "drop"
  ) |>
  arrange(desc(avg_income)) |>
  left_join(state_df, by = c("state" = "abb"))

q3_result |>
  slice(1:10) |>
  select(name, avg_income, total_households) |>
  mutate(
    avg_income = dollar(avg_income, accuracy = 1),
    total_households = comma(total_households, accuracy = 1)
  ) |>
  datatable(
    caption = "Top 10 States by Average Household Income (2015)",
    options = list(pageLength = 10, dom = 't'),
    colnames = c("State", "Avg Income", "Total Households"),
    rownames = FALSE
  )

Answer: District of Columbia had the highest average household income in 2015 at $93,294.

Question 4: NYC Data Scientists Peak

In what year did New York City last have the most data scientists?

Show code
# Data scientists are NAICS code 5182
data_scientists <- WAGES |>
  filter(INDUSTRY == 5182) |>
  mutate(std_cbsa = paste0(FIPS, "0"))

q4_result <- data_scientists |>
  group_by(YEAR) |>
  slice_max(EMPLOYMENT, n = 1) |>
  ungroup() |>
  left_join(
    POPULATION |> 
      mutate(std_cbsa = paste0("C", GEOID)) |>
      select(std_cbsa, NAME) |>
      distinct(),
    by = "std_cbsa"
  ) |>
  mutate(EMPLOYMENT = comma(EMPLOYMENT, accuracy = 1))

q4_result |>
  select(YEAR, NAME, EMPLOYMENT) |>
  datatable(
    caption = "CBSA with Most Data Scientists by Year",
    options = list(pageLength = 15),
    colnames = c("Year", "Metropolitan Area", "Employment"),
    rownames = FALSE
  )

Answer: NYC last had the most data scientists in 2015.

Question 5: NYC Finance Wages

What was the peak year for finance sector wages in NYC?

Show code
# Finance and insurance is NAICS code 52
nyc_finance <- WAGES |>
  filter(str_starts(FIPS, "C35620")) |>
  mutate(is_finance = INDUSTRY == 52) |>
  group_by(YEAR) |>
  summarize(
    finance_wages = sum(TOTAL_WAGES[is_finance], na.rm = TRUE),
    total_wages = sum(TOTAL_WAGES, na.rm = TRUE),
    finance_fraction = finance_wages / total_wages,
    .groups = "drop"
  )

peak_year <- nyc_finance |>
  slice_max(finance_fraction, n = 1)

nyc_finance |>
  mutate(finance_fraction = percent(finance_fraction, accuracy = 0.1)) |>
  datatable(
    caption = "Finance Sector Wages as % of Total in NYC",
    options = list(pageLength = 15),
    colnames = c("Year", "Finance Wages", "Total Wages", "Finance %"),
    rownames = FALSE
  )

Answer: The finance and insurance sector peaked at of total NYC wages in .

Task 3: Initial Visualizations

Visualization 1: Rent vs. Household Income (2009)

Show code
rent_income_2009 <- RENT |>
  filter(year == 2009) |>
  inner_join(INCOME |> filter(year == 2009), 
             by = c("GEOID", "NAME", "year"))

ggplot(rent_income_2009, aes(x = household_income, y = monthly_rent)) +
  geom_point(alpha = 0.5, color = "#2C3E50", size = 2.5) +
  geom_smooth(method = "lm", color = "#3498DB", se = TRUE, linewidth = 1.2) +
  scale_x_continuous(labels = dollar_format(), limits = c(0, NA)) +
  scale_y_continuous(labels = dollar_format(), limits = c(0, NA)) +
  labs(
    title = "Relationship Between Household Income and Monthly Rent",
    subtitle = "Core-Based Statistical Areas (CBSAs) in 2009",
    x = "Average Household Income (Annual)",
    y = "Average Monthly Rent",
    caption = "Source: US Census Bureau, American Community Survey"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 15, color = "#2C3E50"),
    plot.subtitle = element_text(color = "#7F8C8D", margin = margin(b = 15)),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "#ECF0F1"),
    axis.title = element_text(color = "#34495E")
  )

Interpretation: There is a strong positive relationship between household income and monthly rent across CBSAs. Higher-income areas tend to have higher rents, suggesting that housing costs scale with local economic conditions.

Visualization 2: Total vs. Healthcare Employment Over Time

Show code
healthcare_employment <- WAGES |>
  mutate(is_healthcare = INDUSTRY == 62,
         std_cbsa = paste0(FIPS, "0")) |>
  group_by(std_cbsa, YEAR) |>
  summarize(
    healthcare_employment = sum(EMPLOYMENT[is_healthcare], na.rm = TRUE),
    total_employment = sum(EMPLOYMENT, na.rm = TRUE),
    .groups = "drop"
  ) |>
  filter(total_employment > 0, healthcare_employment > 0)

ggplot(healthcare_employment, 
       aes(x = total_employment, y = healthcare_employment, color = YEAR)) +
  geom_point(alpha = 0.5, size = 2) +
  scale_color_viridis_c(option = "viridis") +
  scale_x_log10(labels = comma_format()) +
  scale_y_log10(labels = comma_format()) +
  labs(
    title = "Healthcare Employment vs. Total Employment Across CBSAs",
    subtitle = "Evolution from 2009-2023 (log-log scale)",
    x = "Total Employment (log scale)",
    y = "Healthcare & Social Services Employment (log scale)",
    color = "Year",
    caption = "Source: Bureau of Labor Statistics, Quarterly Census of Employment and Wages"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 15, color = "#2C3E50"),
    plot.subtitle = element_text(color = "#7F8C8D", margin = margin(b = 15)),
    legend.position = "right",
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "#ECF0F1")
  )

Interpretation: Healthcare employment grows proportionally with total employment across CBSAs. The color gradient shows the temporal evolution, with more recent years showing slightly higher healthcare employment shares, reflecting the sector’s continued growth.

Visualization 3: Household Size Evolution Over Time

Show code
household_size <- POPULATION |>
  inner_join(HOUSEHOLDS, by = c("GEOID", "NAME", "year")) |>
  mutate(household_size = population / households) |>
  filter(!is.na(household_size), household_size > 0)

top_cbsas <- POPULATION |>
  filter(year == 2019) |>
  slice_max(population, n = 20) |>
  pull(GEOID)

household_size_subset <- household_size |>
  filter(GEOID %in% top_cbsas)

ggplot(household_size_subset, aes(x = year, y = household_size, 
                                  group = NAME, color = NAME)) +
  geom_line(linewidth = 1) +
  gghighlight(
    NAME %in% c("New York-Newark-Jersey City, NY-NJ-PA Metro Area",
                "Los Angeles-Long Beach-Anaheim, CA Metro Area"),
    use_direct_label = TRUE,
    label_params = list(size = 3.5, nudge_y = 0.05, segment.color = "gray50")
  ) +
  scale_y_continuous(limits = c(2, 3.5)) +
  scale_color_manual(
    values = c(
      "New York-Newark-Jersey City, NY-NJ-PA Metro Area" = "#3498DB",
      "Los Angeles-Long Beach-Anaheim, CA Metro Area" = "#E74C3C"
    )
  ) +
  labs(
    title = "Evolution of Average Household Size Over Time",
    subtitle = "Top 20 largest US metropolitan areas (2009-2023), highlighting NYC and LA",
    x = "Year",
    y = "Average Household Size (persons per household)",
    caption = "Source: US Census Bureau, American Community Survey\nNote: 2020 data unavailable due to COVID-19"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 15, color = "#2C3E50"),
    plot.subtitle = element_text(color = "#7F8C8D", margin = margin(b = 15)),
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "#ECF0F1"),
    legend.position = "none"
  )

Interpretation: Household sizes have remained relatively stable over time, hovering around 2.5-3.0 persons per household, with modest variation across metropolitan areas. This stability suggests that demographic trends in household formation have been consistent despite economic changes.

Building Affordability Indices

Task 4: Rent Burden Index

Show code
rent_burden_data <- INCOME |>
  inner_join(RENT, by = c("GEOID", "NAME", "year")) |>
  mutate(
    monthly_income = household_income / 12,
    rent_to_income_ratio = monthly_rent / monthly_income
  )

baseline <- rent_burden_data |>
  filter(year == 2009) |>
  summarize(baseline_ratio = mean(rent_to_income_ratio, na.rm = TRUE)) |>
  pull(baseline_ratio)

rent_burden_data <- rent_burden_data |>
  mutate(rent_burden_index = (rent_to_income_ratio / baseline) * 100)

Rent Burden Index Definition:

The rent burden index is centered at 100, where:

  • 100 = the national average rent-to-income ratio in 2009 (baseline year)
  • Values above 100 indicate higher rent burden than the 2009 baseline
  • Values below 100 indicate lower rent burden than the 2009 baseline

This indexing allows for easy interpretation: a value of 120 means rent burden is 20% higher than the 2009 national average.

Single Metro Area: Rent Burden Over Time

Show code
nyc_rent_burden <- rent_burden_data |>
  filter(str_detect(NAME, "New York-Newark-Jersey")) |>
  arrange(year) |>
  mutate(
    monthly_rent = dollar(monthly_rent, accuracy = 1),
    monthly_income = dollar(monthly_income, accuracy = 1),
    rent_burden_index = round(rent_burden_index, 1)
  )

nyc_rent_burden |>
  select(year, monthly_rent, monthly_income, rent_burden_index) |>
  datatable(
    caption = "Rent Burden Evolution: NYC Metro Area",
    options = list(pageLength = 15),
    colnames = c("Year", "Monthly Rent", "Monthly Income", "Burden Index"),
    rownames = FALSE
  )

Highest and Lowest Rent Burden Areas

Show code
recent_year <- max(rent_burden_data$year)

rent_burden_extremes <- rent_burden_data |>
  filter(year == recent_year) |>
  arrange(desc(rent_burden_index))

highest_burden <- rent_burden_extremes |>
  slice(1:10) |>
  select(NAME, monthly_rent, household_income, rent_burden_index) |>
  mutate(
    monthly_rent = dollar(monthly_rent, accuracy = 1),
    household_income = dollar(household_income, accuracy = 1),
    rent_burden_index = round(rent_burden_index, 1)
  )

lowest_burden <- rent_burden_extremes |>
  slice_tail(n = 10) |>
  select(NAME, monthly_rent, household_income, rent_burden_index) |>
  mutate(
    monthly_rent = dollar(monthly_rent, accuracy = 1),
    household_income = dollar(household_income, accuracy = 1),
    rent_burden_index = round(rent_burden_index, 1)
  )

highest_burden |>
  datatable(
    caption = glue("Top 10 CBSAs with Highest Rent Burden ({recent_year})"),
    options = list(pageLength = 10, dom = 't'),
    colnames = c("Metro Area", "Monthly Rent", "HH Income", "Burden Index"),
    rownames = FALSE
  )
Show code
lowest_burden |>
  datatable(
    caption = glue("Top 10 CBSAs with Lowest Rent Burden ({recent_year})"),
    options = list(pageLength = 10, dom = 't'),
    colnames = c("Metro Area", "Monthly Rent", "HH Income", "Burden Index"),
    rownames = FALSE
  )

Task 5: Housing Growth Index

Show code
housing_growth_data <- POPULATION |>
  inner_join(PERMITS, by = c("GEOID" = "CBSA", "year")) |>
  arrange(GEOID, year) |>
  group_by(GEOID) |>
  mutate(
    pop_5yr_ago = lag(population, n = 5),
    pop_growth_5yr = population - pop_5yr_ago,
    pop_growth_rate_5yr = (population - pop_5yr_ago) / pop_5yr_ago,
    permits_per_1000 = (new_housing_units_permitted / population) * 1000,
    permits_per_new_resident = new_housing_units_permitted / pop_growth_5yr
  ) |>
  ungroup() |>
  filter(year >= 2014, !is.na(pop_5yr_ago))

housing_growth_data <- housing_growth_data |>
  group_by(year) |>
  mutate(
    instant_index = percent_rank(permits_per_1000) * 100,
    permits_per_new_resident_capped = case_when(
      pop_growth_5yr <= 0 ~ NA_real_,
      permits_per_new_resident > 10 ~ 10,
      TRUE ~ permits_per_new_resident
    ),
    rate_index = percent_rank(permits_per_new_resident_capped) * 100
  ) |>
  ungroup() |>
  mutate(
    composite_growth_index = (instant_index + coalesce(rate_index, instant_index)) / 2
  )

High Scorers: Instantaneous Growth

Show code
recent_instant <- housing_growth_data |>
  filter(year >= 2019) |>
  group_by(GEOID, NAME) |>
  summarize(
    avg_instant_index = mean(instant_index, na.rm = TRUE),
    avg_permits_per_1000 = mean(permits_per_1000, na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(desc(avg_instant_index)) |>
  slice(1:10) |>
  mutate(
    avg_instant_index = round(avg_instant_index, 1),
    avg_permits_per_1000 = round(avg_permits_per_1000, 2)
  )

recent_instant |>
  select(NAME, avg_instant_index, avg_permits_per_1000) |>
  datatable(
    caption = "Top 10 CBSAs: Instantaneous Housing Growth (2019-2023 avg)",
    options = list(pageLength = 10, dom = 't'),
    colnames = c("Metro Area", "Growth Index", "Permits per 1,000"),
    rownames = FALSE
  )

High Scorers: Rate-Based Growth

Show code
recent_rate <- housing_growth_data |>
  filter(year >= 2019, !is.na(rate_index)) |>
  group_by(GEOID, NAME) |>
  summarize(
    avg_rate_index = mean(rate_index, na.rm = TRUE),
    avg_permits_per_new_resident = mean(permits_per_new_resident_capped, na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(desc(avg_rate_index)) |>
  slice(1:10) |>
  mutate(
    avg_rate_index = round(avg_rate_index, 1),
    avg_permits_per_new_resident = round(avg_permits_per_new_resident, 2)
  )

recent_rate |>
  select(NAME, avg_rate_index, avg_permits_per_new_resident) |>
  datatable(
    caption = "Top 10 CBSAs: Rate-Based Housing Growth (2019-2023 avg)",
    options = list(pageLength = 10, dom = 't'),
    colnames = c("Metro Area", "Growth Index", "Permits per New Resident"),
    rownames = FALSE
  )

High Scorers: Composite Growth Index

Show code
recent_composite <- housing_growth_data |>
  filter(year >= 2019) |>
  group_by(GEOID, NAME) |>
  summarize(
    avg_composite_index = mean(composite_growth_index, na.rm = TRUE),
    avg_permits_per_1000 = mean(permits_per_1000, na.rm = TRUE),
    .groups = "drop"
  ) |>
  arrange(desc(avg_composite_index)) |>
  slice(1:15) |>
  mutate(
    avg_composite_index = round(avg_composite_index, 1),
    avg_permits_per_1000 = round(avg_permits_per_1000, 2)
  )

recent_composite |>
  select(NAME, avg_composite_index, avg_permits_per_1000) |>
  datatable(
    caption = "Top 15 CBSAs: Composite Housing Growth Index (2019-2023 avg)",
    options = list(pageLength = 15),
    colnames = c("Metro Area", "Composite Index", "Permits per 1,000"),
    rownames = FALSE
  )

Index Interpretation:

  • Instantaneous Index: Measures permits per 1,000 residents, showing which cities are building most actively relative to their current size
  • Rate-Based Index: Measures permits relative to population growth, identifying cities building faster than their population increases
  • Composite Index: Combines both measures to identify cities with strong overall housing development

Task 6: Identifying YIMBY Success Stories

Show code
yimby_data <- rent_burden_data |>
  left_join(housing_growth_data, by = c("GEOID", "NAME", "year")) |>
  group_by(GEOID) |>
  mutate(
    early_rent_burden = mean(rent_burden_index[year %in% 2009:2011], na.rm = TRUE),
    recent_rent_burden = mean(rent_burden_index[year %in% 2021:2023], na.rm = TRUE),
    rent_burden_change = recent_rent_burden - early_rent_burden,
    first_pop = first(population),
    last_pop = last(population),
    total_pop_growth = (last_pop - first_pop) / first_pop,
    recent_growth_index = mean(composite_growth_index[year >= 2019], na.rm = TRUE)
  ) |>
  ungroup() |>
  select(GEOID, NAME, early_rent_burden, recent_rent_burden, rent_burden_change,
         total_pop_growth, recent_growth_index) |>
  distinct()

median_early_burden <- median(yimby_data$early_rent_burden, na.rm = TRUE)
median_growth_index <- median(yimby_data$recent_growth_index, na.rm = TRUE)

yimby_successes <- yimby_data |>
  filter(
    early_rent_burden > median_early_burden,
    rent_burden_change < 0,
    total_pop_growth > 0,
    recent_growth_index > median_growth_index
  ) |>
  arrange(desc(recent_growth_index)) |>
  mutate(
    early_rent_burden = round(early_rent_burden, 1),
    recent_rent_burden = round(recent_rent_burden, 1),
    rent_burden_change = round(rent_burden_change, 1),
    total_pop_growth = percent(total_pop_growth, accuracy = 0.1),
    recent_growth_index = round(recent_growth_index, 1)
  )

yimby_successes |>
  datatable(
    caption = "YIMBY Success Stories: High Early Burden, Declining Burden, Growth",
    options = list(pageLength = 15),
    colnames = c("GEOID", "Metro Area", "Early Burden", "Recent Burden", 
                 "Burden Change", "Pop Growth", "Growth Index"),
    rownames = FALSE
  )

Visualization 1: Rent Burden vs. Housing Growth

Show code
viz_data <- yimby_data |>
  filter(!is.na(recent_growth_index), !is.na(recent_rent_burden))

viz_data <- viz_data |>
  mutate(
    is_yimby_success = GEOID %in% yimby_successes$GEOID,
    category = case_when(
      is_yimby_success ~ "YIMBY Success",
      recent_rent_burden > median_early_burden & recent_growth_index > median_growth_index ~ "High Burden, High Growth",
      recent_rent_burden > median_early_burden ~ "High Burden, Low Growth",
      recent_growth_index > median_growth_index ~ "Low Burden, High Growth",
      TRUE ~ "Other"
    )
  )

ggplot(viz_data, aes(x = recent_growth_index, y = recent_rent_burden, 
                     color = category, size = total_pop_growth)) +
  geom_point(alpha = 0.6) +
  scale_color_manual(values = c(
    "YIMBY Success" = "#27AE60",
    "High Burden, High Growth" = "#3498DB",
    "High Burden, Low Growth" = "#E74C3C",
    "Low Burden, High Growth" = "#9B59B6",
    "Other" = "#BDC3C7"
  )) +
  scale_size_continuous(labels = percent_format(), range = c(1, 8)) +
  labs(
    title = "Housing Growth vs. Rent Burden Across US Metro Areas",
    subtitle = "YIMBY successes show high growth with decreasing burden",
    x = "Composite Housing Growth Index (2019-2023 avg)",
    y = "Recent Rent Burden Index (2021-2023 avg, 100 = 2009 baseline)",
    color = "Metro Category",
    size = "Total Pop. Growth\n(2009-2023)",
    caption = "Source: US Census Bureau ACS and Building Permits Survey"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 15, color = "#2C3E50"),
    plot.subtitle = element_text(color = "#7F8C8D", margin = margin(b = 15)),
    legend.position = "right",
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "#ECF0F1")
  )

Visualization 2: Rent Burden Change Over Time

Show code
# Get raw YIMBY successes data (before formatting)
yimby_successes_raw <- yimby_data |>
  filter(
    early_rent_burden > median_early_burden,
    rent_burden_change < 0,
    total_pop_growth > 0,
    recent_growth_index > median_growth_index
  ) |>
  arrange(desc(recent_growth_index))

top_yimby <- yimby_successes_raw |>
  slice(1:5) |>
  pull(GEOID)

high_burden_low_growth <- yimby_data |>
  filter(
    recent_rent_burden > median_early_burden,
    recent_growth_index < median_growth_index
  ) |>
  slice_max(recent_rent_burden, n = 3) |>
  pull(GEOID)

comparison_cities <- c(top_yimby, high_burden_low_growth)

rent_burden_comparison <- rent_burden_data |>
  filter(GEOID %in% comparison_cities) |>
  mutate(
    city_type = if_else(GEOID %in% top_yimby, "YIMBY Success", "High Burden City"),
    city_label = str_extract(NAME, "^[^,]+")
  )

ggplot(rent_burden_comparison, aes(x = year, y = rent_burden_index, 
                                    color = city_label, linetype = city_type)) +
  geom_line(linewidth = 1.1) +
  geom_hline(yintercept = 100, linetype = "dotted", color = "#7F8C8D", linewidth = 0.8) +
  scale_linetype_manual(values = c("YIMBY Success" = "solid", 
                                   "High Burden City" = "dashed")) +
  labs(
    title = "Evolution of Rent Burden: YIMBY Successes vs. High-Burden Cities",
    subtitle = "Index = 100 represents 2009 national average",
    x = "Year",
    y = "Rent Burden Index",
    color = "Metropolitan Area",
    linetype = "City Type",
    caption = "Source: US Census Bureau, American Community Survey"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", size = 15, color = "#2C3E50"),
    plot.subtitle = element_text(color = "#7F8C8D", margin = margin(b = 15)),
    legend.position = "right",
    panel.grid.minor = element_blank(),
    panel.grid.major = element_line(color = "#ECF0F1")
  )

Key Findings:

Our analysis identifies several metropolitan areas as YIMBY success stories. These cities started with relatively high rent burdens but managed to increase their housing supply while attracting population growth, leading to improved affordability over time. In contrast, many high-burden cities failed to build sufficient housing, resulting in persistent or worsening affordability challenges.

Policy Brief

Identifying Target Congressional Districts

Show code
# Get raw data for policy analysis
yimby_successes_for_policy <- yimby_data |>
  filter(
    early_rent_burden > median_early_burden,
    rent_burden_change < 0,
    total_pop_growth > 0,
    recent_growth_index > median_growth_index
  ) |>
  arrange(desc(recent_growth_index))

primary_sponsor_city <- yimby_successes_for_policy |> slice(1)
primary_city_name <- str_extract(primary_sponsor_city$NAME, "^[^,]+")
primary_state <- str_extract(primary_sponsor_city$NAME, ", (.{2})", group = 1)

nimby_cities <- yimby_data |>
  filter(
    recent_rent_burden > quantile(recent_rent_burden, 0.75, na.rm = TRUE),
    recent_growth_index < quantile(recent_growth_index, 0.25, na.rm = TRUE)
  ) |>
  arrange(desc(recent_rent_burden))

cosponsor_city <- nimby_cities |> slice(1)
cosponsor_city_name <- str_extract(cosponsor_city$NAME, "^[^,]+")
cosponsor_state <- str_extract(cosponsor_city$NAME, ", (.{2})", group = 1)

tribble(
  ~Role, ~City, ~State, ~`Rent Burden`, ~`Growth Index`, ~Rationale,
  "Primary Sponsor", 
  primary_city_name, 
  primary_state,
  round(primary_sponsor_city$recent_rent_burden, 1),
  round(primary_sponsor_city$recent_growth_index, 1),
  "YIMBY success with declining burden",
  
  "Co-Sponsor",
  cosponsor_city_name,
  cosponsor_state,
  round(cosponsor_city$recent_rent_burden, 1),
  round(cosponsor_city$recent_growth_index, 1),
  "High burden, needs YIMBY policies"
) |>
  datatable(
    caption = "Recommended Congressional Sponsors",
    options = list(pageLength = 5, dom = 't'),
    colnames = c("Role", "City", "State", "Rent Burden", "Growth Index", "Rationale"),
    rownames = FALSE
  )

Identifying Supportive Interest Groups

Show code
sponsor_cbsa_std <- paste0("C", str_pad(primary_sponsor_city$GEOID, 5, pad = "0"), "0")
cosponsor_cbsa_std <- paste0("C", str_pad(cosponsor_city$GEOID, 5, pad = "0"), "0")

city_employment <- WAGES |>
  filter(
    str_starts(FIPS, str_sub(sponsor_cbsa_std, 1, 6)) | 
    str_starts(FIPS, str_sub(cosponsor_cbsa_std, 1, 6)),
    YEAR == 2023
  ) |>
  mutate(city = if_else(str_starts(FIPS, str_sub(sponsor_cbsa_std, 1, 6)),
                        "sponsor", "cosponsor")) |>
  group_by(city, INDUSTRY) |>
  summarize(
    total_employment = sum(EMPLOYMENT, na.rm = TRUE),
    avg_wage = weighted.mean(AVG_WAGE, EMPLOYMENT, na.rm = TRUE),
    .groups = "drop"
  ) |>
  left_join(
    INDUSTRY_CODES |> 
      select(level2_code, level2_title) |> 
      distinct() |>
      mutate(INDUSTRY = as.integer(level2_code)),
    by = "INDUSTRY"
  )

industries_both_cities <- city_employment |>
  pivot_wider(
    id_cols = c(INDUSTRY, level2_title),
    names_from = city,
    values_from = c(total_employment, avg_wage),
    values_fill = 0
  )

if("total_employment_sponsor" %in% names(industries_both_cities) && 
   "total_employment_cosponsor" %in% names(industries_both_cities)) {
  
  industries_both_cities <- industries_both_cities |>
    mutate(
      min_employment = pmin(total_employment_sponsor, total_employment_cosponsor),
      total_employment = total_employment_sponsor + total_employment_cosponsor
    ) |>
    filter(min_employment > 1000) |>
    arrange(desc(min_employment)) |>
    mutate(
      total_employment_sponsor = comma(total_employment_sponsor, accuracy = 1),
      total_employment_cosponsor = comma(total_employment_cosponsor, accuracy = 1),
      avg_wage_sponsor = dollar(avg_wage_sponsor, accuracy = 1),
      avg_wage_cosponsor = dollar(avg_wage_cosponsor, accuracy = 1)
    )
  
  industries_both_cities |>
    slice(1:10) |>
    select(level2_title, total_employment_sponsor, total_employment_cosponsor,
           avg_wage_sponsor, avg_wage_cosponsor) |>
    datatable(
      caption = "Industries with Substantial Employment in Both Cities (2023)",
      options = list(pageLength = 10, dom = 't'),
      colnames = c("Industry", 
                  paste(primary_city_name, "Employment"),
                  paste(cosponsor_city_name, "Employment"),
                  paste(primary_city_name, "Avg Wage"),
                  paste(cosponsor_city_name, "Avg Wage")),
      rownames = FALSE
    )
}

Policy Brief Summary


TO: Congressional Representatives from , and Los Angeles-Long Beach-Anaheim, CA

FROM: National YIMBY Coalition

RE: Federal YIMBY Incentive Program - Proposed Legislation

DATE: October 19, 2025


Executive Summary

We propose federal legislation to incentivize local municipalities to adopt YIMBY (Yes In My Backyard) housing policies. This program would provide grants to cities that increase housing development relative to population growth, with the goal of improving housing affordability nationwide.

Why Your Districts Need This Bill

**** (Primary Sponsor): Your city is a YIMBY success story. With a housing growth index of (well above the national median), has demonstrated that permissive zoning works. Despite starting with a rent burden index of , your city has reduced this to through consistent housing development. This bill would provide federal recognition and additional resources to continue this momentum.

Los Angeles-Long Beach-Anaheim (Co-Sponsor): Your constituents face a rent burden index of 134 (significantly above the 2009 baseline of 100). With a housing growth index of only 32, your city has struggled to build sufficient housing to meet demand. This federal program would provide both incentive funding and technical assistance to jumpstart housing development and make your city more affordable for working families.

Building Labor and Industry Support

Multiple employment sectors in both districts would directly benefit from improved housing affordability. Lower rent burdens mean more disposable income for consumers, benefiting retail and service industries. For employers, improved housing affordability helps attract and retain talent while reducing pressure for wage increases.

Proposed Metrics for Federal Funding

The program would use two key metrics to identify qualifying cities and allocate funding:

Rent Burden Index: Measures the ratio of median rent to median household income, indexed to 2009 national levels (baseline = 100). This metric identifies which cities face the greatest affordability challenges and tracks improvement over time.

Housing Growth Index: A composite measure combining:

  • Instantaneous growth: New housing units permitted per 1,000 residents
  • Rate-based growth: New housing units relative to population growth

Cities showing improvement on these metrics qualify for tiered federal grants, with the largest rewards going to communities that successfully reduce rent burden while accommodating population growth.

Call to Action

We request your co-sponsorship of this legislation and ask that you leverage your relationships with local business and labor groups to build coalition support. Our analysis shows this program would benefit millions of Americans while rewarding cities that embrace housing abundance.


Conclusion

This analysis has identified metropolitan areas across the United States that have successfully implemented YIMBY-friendly housing policies, resulting in improved affordability despite population growth. By contrast, many high-cost cities have failed to build sufficient housing, exacerbating affordability crises.

Key Takeaways:

  1. YIMBY policies work: Cities with high housing growth rates have successfully reduced rent burdens even while experiencing population growth
  2. The problem is measurable: Our rent burden and housing growth indices provide clear, data-driven metrics for policy evaluation
  3. Federal incentives can help: A targeted grant program could encourage more cities to adopt pro-housing policies

The proposed federal YIMBY incentive program would reward cities that increase housing supply, providing both political cover for local leaders and financial resources to continue pro-housing policies. With support from key employment sectors and bipartisan appeal, this legislation represents a pragmatic approach to one of America’s most pressing challenges.

Appendix: Technical Notes

Data Sources

  • American Community Survey (ACS): Annual household income, rent, population, and household data from 2009-2023 (excluding 2020)
  • Census Building Permits Survey: New housing units permitted annually by CBSA
  • BLS Quarterly Census of Employment and Wages (QCEW): Employment and wage data by industry and geography

Metric Definitions

Rent Burden Index: \(\text{Index} = \frac{\text{Rent-to-Income Ratio}}{\text{2009 National Average}} \times 100\)

Housing Growth Indices:

  • Instantaneous: Percentile rank of permits per 1,000 residents
  • Rate-based: Percentile rank of permits per new resident (5-year lookback)
  • Composite: Average of instantaneous and rate-based indices

Reproducibility

All code and data sources are documented in this report. Data is automatically downloaded and cached locally. To reproduce this analysis, run this Quarto document with R and the required packages installed.


Analysis completed: October 19, 2025